Acoustic Modeling Based on Deep Conditional Random Fields

نویسنده

  • Yasser Hifny
چکیده

Acoustic modeling based on Hidden Markov Models (HMMs) is employed by state-of-theart stochastic speech recognition systems. In continuous density HMMs, the state scores are computed using Gaussian mixture models. On the other hand, Deep Neural Networks (DNN) can be used to compute the HMM state scores. This leads to significant improvement in the recognition accuracy. Conditional Random Fields (CRFs) are undirected graphical models that maintain the Markov properties of Hidden Markov Models (HMMs), formulated using the maximum entropy (MaxEnt) principle. It is possible to use DNN to compute the state scores in CRFs. Using CRFs on the top of DNN will lead to an acoustic model known as Deep Conditional Random Fields (DCRFs). In this paper, we present a phone recognition task based on DCRFs. Preliminary results on the TIMIT task show that DCRFs can lead to good results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Classification Error Training of Hidden Conditional Random Fields for Speech and Speaker Recognition

Hidden conditional random fields (HCRFs) are derived from the theory of conditional random fields with hidden-state probabilistic framework. It directly models the conditional probability of a label sequence given observations. Compared to hidden Markov models, HCRFs provide a number of benefits in the acoustic modeling of speech signals. Prior works for training on HCRFs were accomplished with...

متن کامل

Segmental conditional random fields with deep neural networks as acoustic models for first-pass word recognition

Discriminative segmental models, such as segmental conditional random fields (SCRFs), have been successfully applied to speech recognition recently in lattice rescoring to integrate detectors across different levels of units, such as phones and words. However, the lattice generation has been constrained by a baseline decoder, typically a frame-based hybrid HMMDNN system, which still suffers fro...

متن کامل

Automatic segmentation of English words using phonotactic and syllable information

It is difficult to demonstrate the effectiveness of prosodic features in automatic word recognition. Recently, we applied the suprasegmental concept and proposed an extra layer of acoustic modeling with syllables. Nevertheless, there is a mismatch between the syllable and the word units and that makes subsequent steps after acoustic modeling difficult. In this study, we explore English word seg...

متن کامل

Automatic Prosodic Labeling with Conditional Random Fields and Rich Acoustic Features

Many acoustic approaches to prosodic labeling in English have employed only local classifiers, although text-based classification has employed some sequential models. In this paper we employ linear chain and factorial conditional random fields (CRFs) in conjunction with rich, contextually-based prosodic features, to exploit sequential dependencies and to facilitate integration with lexical feat...

متن کامل

Learning in the Deep-Structured Conditional Random Fields

We have proposed the deep-structured conditional random fields (CRFs) for sequential labeling and classification recently. The core of this model is its deep structure and its discriminative nature. This paper outlines the learning strategies and algorithms we have developed for the deep-structured CRFs, with a focus on the new strategy that combines the layer-wise unsupervised pre-training usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013